Search CORE

389 research outputs found

The NLP Engine: A Universal Turing Machine for NLP

Author: Hovy Eduard
Li Jiwei
Publication venue
Publication date: 28/02/2015
Field of study

It is commonly accepted that machine translation is a more complex task than part of speech tagging. But how much more complex? In this paper we make an attempt to develop a general framework and methodology for computing the informational and/or processing complexity of NLP applications and tasks. We define a universal framework akin to a Turning Machine that attempts to fit (most) NLP tasks into one paradigm. We calculate the complexities of various NLP tasks using measures of Shannon Entropy, and compare `simple' ones such as part of speech tagging to `complex' ones such as machine translation. This paper provides a first, though far from perfect, attempt to quantify NLP tasks under a uniform paradigm. We point out current deficiencies and suggest some avenues for fruitful research

arXiv.org e-Print Archive

Reflections on Sentiment/Opinion Analysis

Author: Hovy Eduard
Li Jiwei
Publication venue
Publication date: 06/07/2015
Field of study

In this paper, we described possible directions for deeper understanding, helping bridge the gap between psychology / cognitive science and computational approaches in sentiment/opinion analysis literature. We focus on the opinion holder's underlying needs and their resultant goals, which, in a utilitarian model of sentiment, provides the basis for explaining the reason a sentiment valence is held. While these thoughts are still immature, scattered, unstructured, and even imaginary, we believe that these perspectives might suggest fruitful avenues for various kinds of future work

arXiv.org e-Print Archive

Unsupervised Ranking Model for Entity Coreference Resolution

Author: Hovy Eduard
Liu Zhengzhong
Ma Xuezhe
Publication venue
Publication date: 15/03/2016
Field of study

Coreference resolution is one of the first stages in deep language understanding and its importance has been well recognized in the natural language processing community. In this paper, we propose a generative, unsupervised ranking model for entity coreference resolution by introducing resolution mode variables. Our unsupervised system achieves 58.44% F1 score of the CoNLL metric on the English data from the CoNLL-2012 shared task (Pradhan et al., 2012), outperforming the Stanford deterministic system (Lee et al., 2013) by 3.01%.Comment: Accepted by NAACL 201

arXiv.org e-Print Archive

MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders

Author: Hovy Eduard
Ma Xuezhe
Zhou Chunting
Publication venue
Publication date: 05/01/2019
Field of study

Variational Autoencoder (VAE), a simple and effective deep generative model, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. However, recent studies demonstrate that, when equipped with expressive generative distributions (aka. decoders), VAE suffers from learning uninformative latent representations with the observation called KL Varnishing, in which case VAE collapses into an unconditional generative model. In this work, we introduce mutual posterior-divergence regularization, a novel regularization that is able to control the geometry of the latent space to accomplish meaningful representation learning, while achieving comparable or superior capability of density estimation. Experiments on three image benchmark datasets demonstrate that, when equipped with powerful decoders, our model performs well both on density estimation and representation learning.Comment: Published at ICLR-2019. 12 pages contents + 4 pages appendix, 5 figure

arXiv.org e-Print Archive

TabMCQ: A Dataset of General Knowledge Tables and Multiple-choice Questions

Author: Hovy Eduard
Jauhar Sujay Kumar
Turney Peter
Publication venue
Publication date: 11/02/2016
Field of study

We describe two new related resources that facilitate modelling of general knowledge reasoning in 4th grade science exams. The first is a collection of curated facts in the form of tables, and the second is a large set of crowd-sourced multiple-choice questions covering the facts in the tables. Through the setup of the crowd-sourced annotation task we obtain implicit alignment information between questions and tables. We envisage that the resources will be useful not only to researchers working on question answering, but also to people investigating a diverse range of other applications such as information extraction, question parsing, answer type identification, and lexical semantic modelling.Comment: Keywords: Data, General Knowledge, Tables, Question Answering, MCQ, Crowd-sourcing, Mechanical Tur

arXiv.org e-Print Archive

The Profiling Machine: Active Generalization over Knowledge

Author: Hovy Eduard
Ilievski Filip
Vossen Piek
Xie Qizhe
Publication venue
Publication date: 01/10/2018
Field of study

The human mind is a powerful multifunctional knowledge storage and management system that performs generalization, type inference, anomaly detection, stereotyping, and other tasks. A dynamic KR system that appropriately profiles over sparse inputs to provide complete expectations for unknown facets can help with all these tasks. In this paper, we introduce the task of profiling, inspired by theories and findings in social psychology about the potential of profiles for reasoning and information processing. We describe two generic state-of-the-art neural architectures that can be easily instantiated as profiling machines to generate expectations and applied to any kind of knowledge to fill gaps. We evaluate these methods against Wikidata and crowd expectations, and compare the results to gain insight in the nature of knowledge captured by various profiling methods. We make all code and data available to facilitate future research.Comment: AAAI201

arXiv.org e-Print Archive

Enriching WordNet concepts with topic signatures

Author: Agirre Eneko
Ansa Olatz
Hovy Eduard
Martinez David
Publication venue
Publication date: 19/09/2001
Field of study

This paper explores the possibility of enriching the content of existing ontologies. The overall goal is to overcome the lack of topical links among concepts in WordNet. Each concept is to be associated to a topic signature, i.e., a set of related words with associated weights. The signatures can be automatically constructed from the WWW or from sense-tagged corpora. Both approaches are compared and evaluated on a word sense disambiguation task. The results show that it is possible to construct clean signatures from the WWW using some filtering techniques.Comment: Author list correcte

arXiv.org e-Print Archive

MaCow: Masked Convolutional Generative Flow

Author: Hovy Eduard
Kong Xiang
Ma Xuezhe
Zhang Shanghang
Publication venue
Publication date: 26/10/2019
Field of study

Flow-based generative models, conceptually attractive due to tractability of both the exact log-likelihood computation and latent-variable inference, and efficiency of both training and sampling, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. Despite their computational efficiency, the density estimation performance of flow-based generative models significantly falls behind those of state-of-the-art autoregressive models. In this work, we introduce masked convolutional generative flow (MaCow), a simple yet effective architecture of generative flow using masked convolution. By restricting the local connectivity in a small kernel, MaCow enjoys the properties of fast and stable training, and efficient sampling, while achieving significant improvements over Glow for density estimation on standard image benchmarks, considerably narrowing the gap to autoregressive models.Comment: In Proceedings of Thirty-third Conference on Neural Information Processing Systems (NeurIPS-2019

arXiv.org e-Print Archive

Summarization evaluation using transformed Basic Elements

Author: Eduard Hovy
Stephen Tratz
Publication venue
Publication date: 01/01/2008
Field of study

This paper describes BEwTE (Basic Elements with Transformations for Evaluation), an automatic system for summarization evaluation. BEwTE is a new, more sophisticated implementation of the BE framework that uses transformations to match BEs (minimallength syntactically wellformed units) that are lexically different yet semantically similar. We demonstrate the effectiveness of BEwTE using DUC and TAC datasets.

CiteSeerX

EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference

Author: Hovy Eduard
Naik Aakanksha
Ravichander Abhilasha
Rose Carolyn
Publication venue
Publication date: 26/10/2019
Field of study

Quantitative reasoning is a higher-order reasoning skill that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new framework for quantitative reasoning in textual entailment. We benchmark the performance of 9 published NLI models on EQUATE, and find that on average, state-of-the-art methods do not achieve an absolute improvement over a majority-class baseline, suggesting that they do not implicitly learn to reason with quantities. We establish a new baseline Q-REAS that manipulates quantities symbolically. In comparison to the best performing NLI model, it achieves success on numerical reasoning tests (+24.2%), but has limited verbal reasoning capabilities (-8.1%). We hope our evaluation framework will support the development of models of quantitative reasoning in language understanding.Comment: To appear at CoNLL 201

arXiv.org e-Print Archive